Refactor tiny-model generation scripts#5637
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 730c87629a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| inputs = processor.apply_chat_template( | ||
| conversation=messages, | ||
| add_generation_prompt=True, |
There was a problem hiding this comment.
Use non-chat fallback for VLM smoke inputs
smoke_test unconditionally calls processor.apply_chat_template(...) for every processor with an image_processor, but some VLM checkpoints in this refactor are not chat models (for example google/paligemma-3b-pt-224, whose script calls smoke_test(model, processor) in paligemma_for_conditional_generation.py). In that case apply_chat_template can fail due to missing/unsupported chat templates, so the script aborts before push even though the model itself is valid with regular processor(text=..., images=...) inputs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I think we could address this in the future, google/paligemma-3b-pt-224 is this only vlm that doesn't have a chat template
| return_dict=True, | ||
| return_tensors="pt", | ||
| padding=True, | ||
| ).to(device) |
There was a problem hiding this comment.
smoke_test crashes for PaliGemma lacking chat template
Medium Severity
The smoke_test VLM branch unconditionally calls processor.apply_chat_template(...) for any ProcessorMixin. The PaliGemma script passes its processor to smoke_test, but google/paligemma-3b-pt-224 has no chat template defined, so apply_chat_template will raise at runtime, making the PaliGemma generation script unusable. The PR discussion confirms PaliGemma is the only VLM without a chat template.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5cc7fc8. Configure here.
|
|
||
| config = AutoConfig.from_pretrained(MODEL_ID, text_config=text_config, vision_config=vision_config) | ||
| model = PaliGemmaForConditionalGeneration(config).to(dtype=torch.float32) | ||
| smoke_test(model, processor) |
There was a problem hiding this comment.
PaliGemma smoke_test crashes due to missing chat template
Medium Severity
smoke_test(model, processor) passes a ProcessorMixin to smoke_test, which takes the VLM branch and calls processor.apply_chat_template(...). PaliGemma (google/paligemma-3b-pt-224) has no chat template, so this call will crash at runtime. The PR discussion confirms this: "google/paligemma-3b-pt-224 is the only VLM that doesn't have a chat template." The smoke_test function needs a fallback path for VLM processors that lack a chat template.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 4730fec. Configure here.
albertvillanova
left a comment
There was a problem hiding this comment.
Thanks.
You replaced 450 code lines with 2,765. Are you sure this is the right direction?
On the other hand, what if we want to regenerate all models? Or some specific family of models? After this PR there is no simple way other than running each model individually.
The pin of transformers for the lower supported version is repeated in every script. Why not defaulting to the lower supported version if no explicit version is set for a specific model?
Also, I think create_pr should be the default.
Honestly, in this case, I don't think it's a big deal. Generally speaking, these scripts are written and run just once, and then we keep them mostly for reference. It's not something that requires a lot of maintenance.
This is how I see things:
the old approach was designed to generate all models. But in practice we never generate all models.
right, I'll do this
In practice, I often needed to run the script a few times to iterate on and align the configs before the diff is in a good state. Only once everything looks right do I actually want to open the PR. Maybe there is a confusion: when |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 385ac24. Configure here.


Split tiny-model generation into per-model scripts
Replace the monolithic
scripts/generate_tiny_models.py(one 437-line file, a big tuple-driven loop plus a growingif issubclass(...)ladder for VLMs) with a per-model layout:One script per tiny model. Each script is fully self-contained. Shared logic (push, smoke test, diff, weight init) stays in
_common.py.Why
out_hidden_size, Qwen3-VLlayer_typesdeletion, Qwen3.5 linear-attn fp32 cast, Gemma4 in-place mutation, llava-v1.6 dtype hotfix, …).With one file per model, each script reads top-to-bottom in 20–50 lines. Model-specific quirks stay scoped to the model that needs them.
New features added to every script
print_config_diff(MODEL_ID, model)prints every flat-key difference between the reference Hub config and the tiny model's config before push. Makes it obvious when a shrink kwarg was silently ignored or when an unexpected field drifted.check_dtype_pattern(MODEL_ID, model)reads the reference safetensors header via the Hub API (no weight download) and flags any tensor whose dtype diverges from the reference — catches cases like models with mixed-precision weights (e.g. fp32 norms inside a bf16 checkpoint).TRANSFORMERS_VERSION = "X.Y.Z"and callscheck_transformers_version(...)which raises unless the installed version matches exactly. The pinned value ismax(introduction_version, trl_floor=4.56.2). Rationale: transformers is backward-compatible (a checkpoint saved by X loads on any ≥ X) but not forward-compatible; TRL CI runs against the floor, so tiny models must be saved with the oldest version that supports them to avoid config-field drift. Exact match prevents accidental regenerations with a newer transformers from silently breaking min-version CI.--create-prflag. When a tiny model already exists on the Hub, the default is to skip the push. Passing--create-propens a single PR against the existing repo instead (all artifacts bundled into one commit viaHfApi.create_commit), so updates can be reviewed before landing onmain.How to run
See
scripts/generate_tiny_models/README.mdfor full documentation.Scope
This PR is refactor-only — the Python logic for each tiny model is preserved exactly. No tiny model on the Hub is regenerated by this PR; the existing Hub repos remain the source of truth for CI.
Follow-up PRs will use the new scripts to regenerate and push individual tiny models where the existing Hub checkpoint drifts from the reference (e.g. non-size config fields defaulting to wrong values, missing upstream-added fields, quantization parity, etc.). Each regeneration is one PR per tiny, with a
refs/pr/Noverride intests/conftest.pyuntil merged on the Hub.Note
Low Risk
Changes are confined to developer-facing Hub upload scripts and do not affect TRL runtime or test execution unless these scripts are manually run; main risk is accidental behavior drift when regenerating/pushing tiny models.
Overview
Refactors tiny-model generation by deleting the monolithic
scripts/generate_tiny_models.pyand replacing it with a package of per-model scripts underscripts/generate_tiny_models/(grouped byfor_causal_lm,for_sequence_classification, andfor_conditional_generation).Adds shared helpers in
_common.pyto enforce an exacttransformersversion pin per script, run a minimal forward-pass smoke test, compare dtype patterns against the reference safetensors metadata, print config diffs vs the reference Hub config, and push all artifacts in a single Hub commit with optional--create-prbehavior (skip by default if the repo exists). Documentation for running and version pinning is added inscripts/generate_tiny_models/README.md.Reviewed by Cursor Bugbot for commit 36baabe. Bugbot is set up for automated code reviews on this repo. Configure here.